Asymmetric Gradient Boosting with Application to Spam Filtering
نویسندگان
چکیده
In this paper, we propose a new asymmetric boosting method, Boosting with Different Costs. Traditional boosting methods assume the same cost for misclassified instances from different classes, and in this way focus on good performance with respect to overall accuracy. Our method is more generic, and is designed to be more suitable for problems where the major concern is a low false positive (or negative) rate, such as spam filtering. Experimental results on a large scale email spam data set demonstrate the superiority of our method over state-of-the-art techniques.
منابع مشابه
Research on E-mail Filtering Based On Improved Bayesian
Naïve Bayesian has been widely used in spam filter because it simply and it also could classify texts more correctly and quickly. However, in the process of classifying and filtering, the traditional method doesn't consider the different features between the spam mail and the legitimate mail, and it also doesn't take into account the loss of misclassifying legitimate mail as spam, so there are ...
متن کاملEnsemble of SVM Classifiers for Spam Filtering
Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and d...
متن کاملSpam Source Clustering by Constructing Spammer Network with Correlation Measure
Spam filtering is one of the most challenging problems in electric message systems. In general, recent studies on specifying real spam source are based on content filtering because spammers usually falsify their origin. We propose a method to specify spam source based on structural analysis with complex network. We assume that each spam sources either has the same victim list or uses the same s...
متن کاملA Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain
In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ-text, Mutual Information and Document Frequency feature selection methods ...
متن کاملBiBoost for Asymmetric Learning
Although boosting methods have become an extremely important classification method, there has been little attention paid to boosting with asymmetric losses. In this paper we take a gradient descent view of boosting in order to motivate a new boosting variant called BiBoost which treats the two classes differently. This variant is likely to perform well when there is a different cost for false p...
متن کامل